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This paper presents appearance based methods for face recognition using 
linear and nonlinear techniques. The linear algorithms used are Principal 
Component Analysis (PCA) and Linear Discriminant Analysis (LDA). The 
two nonlinear methods used are the Kernel Principal Components Analysis 
(KPCA) and Kernel Fisher Analysis (KFA). The linear dimensional 
reduction projection methods encode pattern information based on second 
order dependencies. The nonlinear methods are used to handle relationships 
among three or more pixels. In the final stage, Mahalinobis Cosine 
(MAHCOS) metric is used to define the similarity measure between two 
images after they have passed through the corresponding dimensional 
reduction techniques. The experiment showed that LDA and KFA have the 
highest performance of 93.33 % from the CMC and ROC results when used 
with Gabor wavelets. The overall result using 400 images of AT&T database 
showed that the performance of the linear and nonlinear algorithms can be 
affected by the number of classes of the images, preprocessing of images, 
and the number of face images of the test sets used for recognition. 
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1. INTRODUCTION 

The development of the robust recognition system is a very important area under discussion because 
of the wide range of applications in different spheres, such as video surveillance security systems, control of 
documents, forensics systems etc. [1]. In the last three decades face identification problem has emerged as an 
important research area with many possible applications that undoubtedly alleviate and assist safeguard our 
everyday lives in many aspects [2]. Unlike many other identification methods, face recognition does not need 
to make direct contact with an individual in order to validate their identity. This can be useful for surveillance 
or tracking and in detection systems. Data acquisition in general is fraught with problems for other 
biometrics: techniques that rely on hands and fingers can be rendered useless if the epidermis tissue is injured 
in some way (i.e., bruised or cracked). Although there are a number of face recognition algorithms which 
work well in constrained environments, face recognition is still an open and difficult problem in real 
applications [3]. 

In face recognition, appearance-based approach has been widely used [4]. The Face images often 
have a large number of pixel values and are represented as high-dimensional vectors or arrays. Operating 
directly on these vectors is inefficient, would lead to high computational costs and storage space, and poses 
the curse of dimensionality to many learning tasks. A useful subspace representation has thus become 
desirable in many image processing applications. The holistic and component-based methods are two main 
ways for representing the facial appearances. Compared to holistic approaches, feature-based methods are 
less sensitive to variations in illumination and viewpoint [2]. In holistic representation, a facial image is 
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considered as a vector of pixels and is represented as a single point in the high dimensional space. Subspace 
methods are then applied to reduce the high-dimensional data onto a lower dimensional space while retaining 
intrinsic features for further classification [5], [6]. The holistic appearance based methods can be classified as 
linear methods and nonlinear methods. Linear methods explicitly transform data from high dimensional 
subspace into low dimensional subspace by linear mapping. In nonlinear techniques, explicit projections are 
not done. Instead faithful low dimensional data matrix is obtained directly from high dimensional data 
matrix. The successful face linear or nonlinear methods used depends heavily on the particular choice of the 
features used by the pattern classifier. Therefore, detailed evaluation and benchmarking of the algorithms is 
crucial for later use. 

Appearance face recognition methods do not perform well during ill conditions [2], [7], even the 
most representative recognition techniques frequently used in conjunction with face recognition could not 
achieve best result [7], [8]. One of the most successful classifiers that has been used for image representation 
is Gabor Wavelets. This is because it is a very strong preprocessing and extraction algorithms [9], [10], [11]. 
In this regard Gabor Wavelets is chosen for this work to provide robust face recognition algorithms. The 
remaining parts of this paper are organized as follows: Section II gives a review of face recognition 
methodologies. Section II describes the methodology of this work. Section IV presents the experiments and 
the results. Finally, the conclusion of the work is drawn in section V. 


2. RELATED STUDIES 

A major issue of face recognition is how to improve the overall performance of the employed 
recognition techniques [12], [13]. Most of the previous methods were mainly focused on frontal face images 
or single-view-based face recognition. The problem with these early solutions was the manual computations 
of features measurements and locations [14]. The notable earliest approaches in is the Eigenfaces [2]. The 
eigenfaces techiques was developed by Sirovich and Kirby (1987) and used by Matthew Turk and Alex 
Pentland in face classification [13], [15] by using standard linear algebra technique. An NxN image I is 
linearized in a N? vector, so that it represents a point in a N?-dimensional space. Recognition of a probe 
image is performed in a lower dimensional space by means of a dimensionality reduction technique using 
PCA (Principal Component Analysis). After the linearization the mean vector is calculated. The covariance 
matrix is then computed, in order to extract a limited number of its eigenvectors, corresponding to the 
greatest eigenvalues called eigenfaces. As the PCA is performed only for training the system, this technique 
appears to be very fast when testing new face images. The PCA has been intensively exploited in face 
recognition applications, and many of its variations have been developed. Many other linear projection 
methods that performed better under some conditions have been studied. The LDA (Linear Discriminant 
Analysis) [6] has been developed as a better technique than PCA. When compared with PCA, LDA gives a 
higher recognition rate when a wide training set is available. To provide a stronger system, PCA has been 
combined with LDA [16] but it has been shown in [17] that, combining PCA and LDA, cannot always 
produced desired result. ICA was introduced for providing face representations with high-order dependencies 
that are separated into individual coefficients and was expected to give superior recognition performance than 
PCA which only depend on separate second-order redundancies [18]. Afterwards, ICA theory was 
contradictory, and it has been shown that ICA does not always perform better then PCA or just suitable for a 
specific task [19], [20]. To overcome some of the limitations of the mentioned, other hybrids of PCA, LDA 
and ICA algorithms were develped. Most of these newer techniques involve combination of one or more 
algorithms [8], [21], [6]. 

As a PCA and LDA fail to discover the underlying structure of face images that lie on a hidden 
nonlinear submanifold, a laplacianface approach was proposed in [22] to provide a method that could detect 
the underlying structure of faces that lie on a hidden nonlinear submainfold that PCA and LDA could not 
discover. During the training stage, the images were first projected to a PCA subspace so that the resulting 
singular matrix is nonsingular. Egenvectors and Eigenvalues were then constructed for the generalized 
eigenvector problem so that the linear mapping best preserves the manifold’s estimated intrinsic geometry in 
a linear sense. The Laplacianfaces method is based on Locality Preserving Projections (LPP) which is a 
linear method and may not detect all aspects of the intrinsic nonlinear manifold structure by preserving local 
structure. A novel approach based on Two-dimensional Principal Component Analysis (2DPCA) and Kernel 
Principal Component Analysis (KPCA) for face recognition was developed in [23]. The work first performs 
Two-dimensional Principal Component Analysis process to project the faces onto the feature space. It then 
performs Kernel Principal Component Analysis on the projected data. Although it showed that the system 
achieved a high recognition rate, it only focused on Principal Component Analysis improved techniques that 
maximizes global variance for compression and not for classification like LDA. Thus the system will 
performed worse for system having high number of classes of images. A Face Recognition algorithm with 
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Support Vector Machines (SVM) was presented in [24]. The paper provided a face identification system 
based on SVM and Discrete Wavelet Transform (DWT). The DWT was used to avoid increase in the 
computational time. However it has been shown that nonlinear vector like SVM that do not always 
performed better than linear methods in real-world data sets having more complicated distributions, though 
they easily demonstrate their virtue on artificial nonlinear data [5]. Face Recognition System Based on 
Principal Component Analysis (PCA) with Back Propagation Neural Networks (BPNN) was developed in 
[25]. Support Vector Machine was used for face recognition. Similarly, in [26], BPNN was used. The 
features of the query face image and database face images was extracted using Gabor transform and trained 
using BPNN. Generally, neural network are nonlinear methods that encounter problems when the number of 
classes increases are not suitable for a single model image recognition task [21]. This paper involves the use 
of Gabor Wavelet for image representation with linear and non linear techniques, namely, PCA, LDA, KPCA 
and KFA and also studies the result of the same algorithms without the use of Gabor image representation. 


3. METHODOLOGY 

Figure 1 show the Frame work used. The first stage is the preprocessing stage. The preprocessing 
representation addresses image variabilities caused by illumination, facial expression and other different 
image imperfections. This was achieved by the integration of Gabor wavelet for extraction process to form 
the Gabor image representation of the images. The Gabor wavelets (kernels, filters) used is defined as shown 
in Equation (1) 


lens ia? 


2 2 
= lkn vll 202 ikn pZ -Z 1 
Gn p(z) = Maal 927 — |eiknoz — e-3 a) 


where n and b define the orientation and scale of the Gabor kernels, z = (x,y), ||. || denotes the norm 
operator, and the wave vector kn,» is defined as follows: 


knp = kpe” (2) 


where kp = kmax / f? and @,=1n/ 8. kmay is the maximum frequency, and f is the spacing factor between 
kernels in the frequency domain. The effect of the difference of convex functions term becomes negligible 
when the parameter ø, which determines the ratio of the Gaussian window width to wavelength, has 
sufficiently large values. As in most cases Gabor wavelets used is of five different scales, b e {0,...,4}, and 
eight orientations, n € {0, ...,7} [27]. 

For the Gabor feature representation, the Gabor wavelet representation of an image is the 
convolution of the image with the family of Gabor kernels as defined by (1). Let [(x,y) be the gray-level 
distribution of an image, the convolution of image J is defined as follows: 


On (Z) =1 (2) * Pnp (2) (3) 


where the following counts z = (x,y), * denotes the convolution operator, and O, p (Z) is the convolution 
result corresponding to the Gabor kernel at orientation n and scale b. Which produce the set 


S= {Onn (2): n € {0,...,7},b € {0, A} (4) 


Equation (4) forms the Gabor wavelet representation of the image J (z). To encompass different spatial 
frequencies (scale), spatial localities, and orientation selectivites, all the respresentation results are 
concatenated and an augmented feature vector X is derived by downsampling [26], [28]. 

The dimension of Gaborfaces is very high. In order to reduce the dimensionality, at the same time 
reserve the intrinsic part of the images, the dimensional reduction techniques were used in the second stage. 
After passing through the preprocessing stages, the trained images undergo dimensional reduction processes 
using the linear and the nonlinear techniques: PCA, LDA, KPCA and KFA by projecting the images onto 
subspace and storing their projections in the database. Also, before the matching, the probe images are 
preprocessed, and projected onto the same subspace as the gallery image using a similar algorithm. The final 
stage is the recognition stage. 
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The test image projections is then compared to stored gallery projections by using Mahalanobis 
Cosine distance metric by calculating the distances from a probe image projection to all gallery images 
projections and then choosing the minimum distance as a similarity measure. The identity of the most similar 
gallery image is then chosen to be the result of recognition and the unknown probe image is identified. The 
dataset are from the AT&T face database. There are 400 images in all. There are 40 datasets containing 
different subjects. Each datasets has 10 samples. The first three samples are selected for training, the next 
four samples were preserved as the test set, the remaining samples were used as the evaluating set. The 
algorithms used are described in the next section. 


3.1. Principal Component Analysis (PCA) 

Principal Component Analysis (PCA) is used to linearly separate the image vectors into a lower 
data. If v-dimensional vector of each training face of set has M images, the Principal Component Analysis 
(PCA) algorithm is used to find a t-dimensional subspace whose basis vector correspond to the maximum 
variance in the direction in the original image space s, where (t <v), t is also the eigenvectors of the 
covariance matrix. All images of known faces are projected onto the face space to find sets of weights that 
describe the contribution of each vector. 

By comparing a set of weights for the unknown face to sets of weights of known faces, the face can 
be identified. If the image elements are random variables during the recognition process, the Principal 
Components Analysis (PCA) basic vectors are defined as eigenvectors of the scatter matrix Vy is defined as: 


dia u). i BY () 
where u is the mean of all M images in the training set or the mean face, T is the transpose of its properties 
and x; is the ith image with its columns concatenated in a vector. Figure 2 shows the mean face of the 


workand the Eigen faces of the Eigen Faces of the 1‘, 2", 100", and the 300" subjects. The Principal 
components of t eigenvectors are t largest eigenvalues, creating a t dimensional face space [20], [29]. 


First Stage: 
Preprocessing of images 


Gabor Wavelet 


Match Score using 
Melabollis distance Measure. 


Compare by calculating the 
minimum distance 
Provide the final 
Result 


Figure 1.The System Frame Work 
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Figure 2. the mean face of the workand the Eigen faces 


3.2. Linear Discriminant Analysis (LDA) 

Linear Discriminant Analysis (LDA) is used to provide a better classification of data where the data 
contain high number of classes. This is achieved by finding the best representation among classes. LDA 
considers for all samples of all classes, the between-class scatter matrix Sp and the within-class scatter matrix 
Sw which are defined by 


Sp = Die Mi. (xi — u). i — wT (6) 


where M; is the number of training samples in class i, c is the number of distinct classes, u; is the mean 
vector of samples belonging to class i and X; represents the set of samples belonging toclass i with x, being 
the k-th image of that class. T is the transpose of its properties. Spg represents the scatter of features around 
the overall mean for all face classes and Sy represents the scatter of features around the mean of each face 
class. The goal is to maximize Sg while minimizing Sw, in other words, maximize the ratio det|Sp|/|5S,,|[29]. 
This ratio is maximized when the column vectors of the projection matric (W,p,) are the eigenvectors of 
Sw1.S,. In order to prevent Sẹ to become 


Sw = Dies Depex, (Xi — Hi). (xi — Hi)" (7) 
singular, PCA is used as a preprocessing step and the final transformation is Woot = W,pa Woes [20]. 


3.3. Kernel Principal Component Analysis (KPCA) 

The Kernel projection techniques are used to provide a better discrimination among nonlinearity of 
data. The main idea is to map input data into a high-dimensional feature space and perform a similar PCA 
process explained in equation (5). The Kernel methods is used to identify a linear subspace in the high- 
dimesnional feature space rather than the original input space by avoiding direct computation of the 
nonlinear mapping ® through “kernel trick” and derive the kernel transformation matrices based on the 
kernel matrices of the training data. The rationale of performing such a nonlinear mapping comes from 
Cover’s theorem which state that “A complex pattern-classification problem cast in a high-dimensional space 
nonlinearly is more likely to be linearly separable than in a low-dimensional space” [30], [23]. By 
considering the set of image samples Xx, 


Xk = [Xk ame E R” (8) 


Kernel PCA is used by projecting each vector x is projected from the input space, R”, to a high dimensional 
feature space, R’, by a nonlinear mapping function: ®: R"— Rf, f > n. PCA process is then carried out on 
the kernel subspaces by solving the corresponding eigenvalue problem: 


Aw? = C®w® (9) 


where C® is a covariance matrix. All solution w® with A 0 lie in the span of ®(x,),..., D(X) [29]. 
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3.4. Kernel Fisher Analysis (KFA) 

KFA is used to reduce the data into a lower subspace and designed to work better than the linear 
methods where there are complex manifold of data high number of classes. It is performed using the similar 
procedure of KPCA except that Fisher Linear Discriminant (FLD) is considered instead of PCA after the 
transformation of the subspace to higher dimension. If x, has the same value of equation (8) [31], the same 
projection is performed on the vector x. to get the function ®: R"— RÔ, f> n. Let the projected samples ®(x) 
be centred in Rf and let the equations that use dot products be formulated for Fisher linear Discrimate 
Analysis (FLD) only. Assume the within-class and between-class scatter matrices be SẸ and SÊ, to apply 
FLD in kernel space, the solution to eigenvalues A and eigenvectors w® of 


ASWw® = SPw® (10) 


are derived by finding the eigenvectors corresponding to largest generalized eigenvalue. The kernel function 
is introduce defined by 


(kes) tu= K(Xtr, Xus) Zz D(Xp,)-P(X ys") (11) 


where there exists a c-class problem and a r-th sample of class t and the s-th sample of class u be x;, and x, 
respectively (where class t has |, samples and class u has l, samples). Then finally project ®(x) to a lower 
dimensional space spanned by the eigenvectors w® in a way similar to Kernel PCA using Fisherface method 
for face recognition [7],[29]. 


4. MATCHING 

For the matching task, the Mahalinobis Cosine (MAHCOS) distance metric is used. This is because 
it is the most accurate and efficient in terms of verification, identification and robustness [32]. It measure the 
cosine of the projected into the recognition space using the corresponding dimensional reduction techniques. 
After transformations are completed, Mahalinobis Distance measures is used to classify data points by using 
it to compute the similarity between two faces features. For images u and v with corresponding projections m 
and n in Mahalinobis space, where m and n are two feature vectors transformed into Mahalinobis space, the 
Mahalinobis Cosine is [33]: 


|m||n|cos(Omn) _ mn 


S ine (u, V) = cos(O = 12 
MahCosine | ) ( mn) |mI|n| Imlin] ( ) 
with an angle © defined as the angle between the images after they have been projected into the recognition 
space as distance between projected images. This distance is refered to as the MahCosine distance. 


5. EVALUATION AND RESULTS. 

In order to test the performance of each algorithm three different type performance metrics are used 
with and without the use of Gabor Wavelets. They are the: (a) Cumulative Match Score Curve (CMC), (b) 
Receiver Operating Characteristic (ROC) Curve, and (C) Expected Performance Curve (EPC). The 
Cumulative Match Curves (CMCs) is used to calculate the recognition rate. The horizontal axis represents the 
rank and the vertical axis represents the cumulative match score corresponding to the rank. The lower curve 
corresponds to the face recognition techniques with a lower performance. The Receiver Operating 
Characteristic (ROC) curve is a more general curve used in face recognition performance. The horizontal axis 
represents the false accept rate or FAR, while the vertical axis corresponds to the face verification rate or 
FVR. The EPC curve shows classifiers from the viewpoint of the tradeoff between false alarm and false 
rejects probabilities. The EPC curves are produce using an evaluation image set and a test image set which 
are required. For each a, the decision threshold that minimizes the weighted sum of the False Acceptance 
Rate (FAR) and False Rejection Rate (FRR) is computed on the evaluation image set. This threshold is then 
used on the test images to determine the value of the half total error rates (HTER) defined as HTER = 
(FAR+FRR)/2. EPC then plot the half total error rate (HTER=0.5(FAR+FRR)) against the parameter a, 
which controls the relative importance of the two error rates FAR and FRR in the expression: a FAR + (1 — 
a)FRR [34]. 
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5.1. Interpretation of Results 

Table 1 gives the detailed summary of the results. There are 400 images containing 10 different 
images of each person. 120 images are used for training, 160 images are used for testing, the remaining 
images serve as the evaluation sets. From the results, LDA outperformed other methods in most cases and 
have a very close recognition rate with KFA. Both LDA and KFA have the highest performance of 93.33 % 
from the CMC and ROC results when the evaluating set is used with Gabor. Their performance using CMC 
and ROC with Gabor with more test images is 91.88% and 93.13% respectively (i.e. when the test set is 
used). It is obvious that the KFA and LDA high performance is due to high number of classes of the ima ges 
of the system (the database has a total of 40 between-class (matrices) and 10 within-class images). The 
results of the experiment show that incorporating Gabor image representation makes a notable contribution to 
the overall face recognition performance of all the algorithms. This is the Gabor wavelets representations are 
to some extent insensitive to image invariablilities. When more number of test images are used with the 
Gabor wavelets, there is a general reduction in performance of all the algorithms and there is more reduction 
when PCA based algorithms are used (PCA and KPCA) than the LDA based algorithms (LDA and KFA). 
For example with Gabor Wavelets, the ROC performance of both PCA and KPCA recognition rate for the 
evaluating set(which contain 120 probe images) are both 92.50% and they decrease to 63.13% and 56.88% 
when the test set (which contains 160 images) is used. With the use of Gabor Wavelets, LDA and KFA ROC 
performance are both 93.33% when the evaluating set is used and they just decrease to 91.88% and 93.13% 
when the test set is used. This shows that LDA based algorithms still perform better when the number of 
test/probe is increased. It can also be seen that KPCA perform worst having the highest error rates (2.68 % 
with Gabor and 8.80% without Gabor). It also has the lowest recognition rate. It performs worse than PCA 
but only perform better than PCA (from the CMC results with Gabor on the evaluating set) when the Gabor 
filters is used. Overall the linear based algorithm still performs better than the nonlinear ones. 


Table 1. Recognition rates using different Face Recognition performance metrices 


Performance Metrics 


Without 


Without 


Gabor 


Without 


Methods Gabor Gabor (Gallery: Gabor Gabor Gabor 
(Gallery and . Evaluating | (Gallery | HTER(in | HTER(in 
i (Gallery: $ “4 4 
Evaluating Evaluatin set) -Test set) %) %) 
sa) set) VRC) | WO | WA 
BR(%) 
66.07 66.79 92.50 63.13 1.61 4.72 


sess |268 |830 
93.13 
Symbol Definition 
RR Rank One Recognition Rate 
VR Verification Rate 
HTER Half Total Error Rate 
CMC Cumulative Match Curve 
ROC Receiver Operating Characteristics 
EPC Expected Performance Curve 


The following conclusions are drawn from the results obtained from the experiment (under equal 
working conditions): 
1) The performance of the linear and nonlinear algorithms depends on some conditions. These are explained 
bellow: 

a. The number of classes of a facial recognition system can affects the performance of the type of 
linear and nonlinear algorithm used. LDA (a linear algorithm) and KFA (a nonlinear algorithm) 
expressly provides best discrimination among classes. 

b. The preprocessing using Gabor filters increases the recognition rate of both the linear and nonlinear 
algorithms. 
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c. When more test images are used after preprocessing with Gabor wavelets, there is reduction in 
recognition rate of all algorithms however the reduction is more for the PCA based algorithms than 
the LDA based ones. This shows that the increase in number of test images can affects recognition 
rate (of all the algorithms) negatively but the LDA based (classed based) algorithms are less affected 
than the PCA based ones. 

2) From the overall results the linear algorithms is better than the nonlinear ones. 


6. CONCLUSION 

The results show that the number of classes and test images of a facial recognition system can have 
an effect on the recognition rate of a particular algorithm used. Incorporating Gabor image representation 
with linear and nonlinear algorithms increases their recognition rate. Linear subspace techniques tend to 
perform better than the nonlinear linear ones from the result of the work carried out. The research will be of 
outmost importance to any organization that wishes to develop a facial recognition system and know which 
of the face recognition algorithms have a better recognition rate. This study will also be of immense benefit 
to prospective researchers who would like to undertake similar studies. 

This work is able to compare linear and nonlinear face recognition algorithms produced. The 
research is only concern about 2D holistic face recognition algorithm. A new development can make use of 
2D local based appearance face recognition algorithms using linear and nonlinear algorithms. 
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